The California Department of Public Health detected a novel infectious respiratory disease outbreak in California between May to December 2023, and collected information about the number of cases and case severity, along with demographic information on infected individuals. This report aims to examine the course of this outbreak and understand if it disproportionately affected certain demographic or geographic populations. In particular, race/ethnicity, county, and age factors are examined to identify populations who may benefit most from prevention and treatment resources.
This data source is simulated data of weekly infectious disease cases for each county in California reported from public health agencies and organizations such as county health departments during 2023, beginning in late May 2023 until the end of December 2023. The infection data are linked with demographic information such as age group, binary gender, race and ethnicity.
This is a simulated dataset containing reported weekly cases of a disease categorized by date of diagnoses, patient demographics, and cumulative totals for infected, unrecovered, and severe cases, for the county of Los Angeles. Such data would have been collected by public health agencies from around LA county. Data was collected from late May 2023 until the end of December 2023. The data provides patient demographics of age group, binary gender, race and ethnicity which will help to answer if there are disparities in the rate of infection among different populations.
This dataset is simulated, but approximates what population data from the State of California might look like. It includes population estimates by the CA Dept of Finance for 2023 by CA county and demographic categories (age, race, and sex).
First, the two disease infection datasets were joined together (Source 1 & 2) simply by binding the rows (added the rows from both datasets together). This generated a joined list of infection data for all counties in California.
Next, strata of interest were identified. Since we’re interested in the distribution of infections across race and geographic categories, the rows were grouped by county and then by race/ethnicity. Counts of the infections in each stratum were then summed up. Since both datasets have information on both 1. new infections and 2. new severe infections in separate columns, a sum for each of these two columns was obtained per stratum.
This stratified summary of weekly new and severe case counts was left joined with the California population dataset, using county and race_ethnicity categories as keys. The resulting table shows the counts of weekly new infections, new severe infections, and total population count for each stratum.
To calculate the ra